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ABSTRACT 

State of the art component-based software collections — such 
as FOSS distributions — are made of up to dozens of thou- 
sands components, with complex inter-dependencies and con- 
flicts. Given a particular installation of such a system, each 
request to alter the set of installed components has poten- 
tially (too) many satisfying answers. 

We present an architecture that allows to express ad- 
vanced user preferences about package selection in FOSS 
distributions. The architecture is composed by a distribution- 
independent format for describing available and installed 
packages called CUDF (Common Upgradeability Descrip- 
tion Format), and a foundational language called MooML 
to specify optimization criteria. We present the syntax and 
semantics of CUDF and MooML, and discuss the partial 
evaluation mechanism of MooML which allows to gain effi- 
ciency in package dependency solvers. 

Categories and Subject Descriptors 

K.6.3 [MANAGEMENT OF COMPUTING AND IN- 
FORMATION SYSTEMS]: Software Management— Soft- 
ware selection; D.2.9 [SOFTWARE ENGINEERING]: 

Management — Life cycle 

General Terms 

Design, Languages, Management 

Keywords 

FOSS, upgrade, packages, selection, preferences 

1. INTRODUCTION 

One of the noteworthy characteristics of FOSS (for Free 
and Open Source Software) distributions — such as Debian 
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GNU/Linux, Red Hat Enterprise Linux, or FreeBSD — is 
the availability of large numbers of components (usually 
called packages in this environment) that can be installed, 
removed, and upgraded as single entities. Systems like De- 
bian can have up to dozens of thousands components, grow- 
ing steadily across releases and linked by complex inter- 
dependencies [1]. Similar architectures exist in other con- 
texts where components are used to define the granularity 
at which software can be deployed: the analogous of FOSS 
packages can be found for example in the Eclipse [3] and 
Maven 1 platforms; in both cases the number of components 
and their inter-relationships are similar to what exists in 
common FOSS distributions. 

In all such scenarios, user installations are managed us- 
ing tools such as package managers which receive user re- 
quests to change the installation in some way — e.g. install 
a new component — and try to satisfy them equipped with 
the knowledge of where to find components and which are 
their inter-relationships. When the number of components 
grows, a given user request can have thousands of accept- 
able solutions. For instance, in satisfying the simple "install 
wordpress" request a package manager can be faced with 
questions like: "which version of wordpress should be in- 
stalled?", "using which web server?", "relying on which PHP 
implementation", etc. The number of potential solutions 
for the final user can easily grow exponentially; currently, 
the actual choice depends on internal heuristics implemented 
by specific package managers and is customizable in ad-hoc 
ways. 

This paper focuses on FOSS distributions and presents 
an architecture to specify advanced user preferences in that 
context, abstracting over package manager specific details. 
The architecture is composed by two parts: a format to de- 
scribe upgrade scenarios called CUDF (Common Upgrade- 
ability Description Format) and a foundational language to 
encode user preferences called MooML (MancOosi Opti- 
mization Meta-Language). 

The MooML language is foundational in the sense that 
it is not (necessarily) meant to be a language for the end 
user or the system administrator; it is rather meant as an 
intermediate language with a precise semantics, which can 
be used by developers of installation tools as an abstract 
input language for expressing user preferences, and which 
on the other hand can be the target language for represent- 
ing the choice a user may have expressed, for instance using 
some graphic interface. MANCOOSI, which gives the name 
to MooML, is an ongoing project which aims, among oth- 
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ers, to develop better algorithms and tools to plan upgrade 
paths based on various information sources about software 
packages and on optimization criteria [6]. 

Paper structure. 

The remainder of this section outlines the upgrade process 
FOSS packages are subject to. Section 2 presents common 
optimization criteria for package upgrade scenarios; there 
criteria will serve as running examples throughout the pa- 
per. Section 3 summarizes essential features of the CUDF 
language for describing upgrade scenarios. MooML itself 
and its partial evaluation mechanism are presented respec- 
tively in Section 4 and 5. 

1.1 FOSS Package Upgrade Generalities 

Packages. 

FOSS (binary) distributions are organized as collections 
of packages, i.e. abstractions defining the granularity at 
which users can act (add, remove, upgrade, etc.) on in- 
stalled software. Abstracting over format-specific details, a 
package is a bundle of the 3 parts depicted in Figure 1. 



Package < 



1. Set of files 




1.1. 


Configuration files 




2. Set of valued meta-information 




2.1. 


Inter-package relationships 




3. Executable configuration scripts 



Figure Is Constituents of a package. 

The set of files (1) represents what the package is deliver- 
ing: executable binaries, data, documentation, etc. This 
set includes configuration files (1.1), that affect the run- 
time behavior of the package and are meant to be locally 
customized. Package meta-information (2) contains infor- 
mation varying from distribution to distribution. A com- 
mon core provides: its name (a unique identifier), a version 
(taken from a totally ordered set), maintainer and pack- 
age description, and most notably inter-package relation- 
ships (2.1). The kinds of relationship vary with the package 
manager used, but there exists a de facto common subset 
including dependencies (the need of other packages to work 
properly), conflicts (the inability of being co-installed with 
other packages), feature provisions (the ability to declare 
named features as provided by a given package, so that other 
packages can depend on them) , and restricted boolean com- 
binations of them [8]. Finally, packages come with a set of 
executable configuration (or maintainer) scripts (3). Their 
purpose is to let package maintainers attach actions to hooks 
executed by the installer; actions are used to finalize package 
configuration during deployment. 

Upgrades. 

A distribution is a collection of packages. The subset of a 
distribution corresponding to the packages actually installed 
on a machine is called package status and is meant to be al- 
tered using a package manager. An upgrade scenario is the 
situation in which a user, typically the system administra- 
tor, submits a user request to the package manager, with 



# apt-get install aterm 

Reading package lists . . . Done 

Building dependency tree . . . Done 

The following extra packages will be installed: 

libaf terimageO 
upgraded, 2 newly installed, to remove and 

1786 not upgraded. 
Need to get 386kB of archives . 

807kB of additional disk space will be used. 

Get : 1 http://ftp.debian.org libaf terimageO 2.2.8 
Get: 2 http://ftp.debian.org aterm 1.0.1-4 

Fetched 386kB in Os (410kB/s) 

Selecting package IibafterimageO. 

(Reading database ... 294774 files and dirs ...) 

Unpacking IibafterimageO . . . 

Selecting package aterm. 

Unpacking aterm (aterm_l . . l-4_i386 . deb) ... 

Setting up IibafterimageO (2.2.8-2) ... 
Setting up aterm (1.0.1-4) ... 

Table 1: The package upgrade process. Horizontal 
lines separate the phases described in the text. 



the intention to alter the packages status. Several entities 
and problems are involved in, and should be grasped by a 
complete description of, an upgrade scenario [7]. The main 
entities are packages and the most relevant problem for the 
present paper is upgrade planning; both are briefly described 
below. 

Table 1 summarizes the different phases of the upgrade 
process, using as an example the popular apt-get package 
manager (others follow a similar process). Phase (1) is a user 
specification of how the package status should be altered. 
The expressiveness of the request language varies with the 
package manager: it can be as simple as requesting the in- 
stallation/removal of a single package by name, or can also 
enable limited expression of per-package preferences such 
as APT pinning [11]. Phase (2) (dependency resolution) 
checks whether a package status satisfying all dependencies 
and user request exists, it has been shown that this prob- 
lem is at least NP-complete [8] . If this is the case, one such 
package status is chosen — trying to satisfy user preferences, 
if any — and gets called solution. Deploying a new status cor- 
responding to the solution consists of package retrieval (3) 
and unpacking (4), possibly intertwined with several config- 
uration phases (5) where maintainer scripts get executed. 

Various challenges related to the upgrade process still need 
to be properly addressed. An example of a very practi- 
cal challenge is the need to provide transactional upgrades, 
offering the possibility to roll back in case an unexpected 
(and unpredictable in general) failure is encountered during 
upgrade deployment [7]. Other challenges concern upgrade 
planning. For instance, dependency resolution can fail either 
because the user request is unsatisfiable (e.g., user error or 
inconsistent distributions [9]) or because the package man- 
ager is unable to find a solution. Completeness — the guar- 
antee that a solution will be found whenever one exists — is 
a desirable package manager property [15], unfortunately 
missing in most package managers, with too few claimed 
exceptions [10, 17]. 

User Preferences. 

While suitable and complete techniques to provide de- 
pendency solving completeness are now well-known [9] and 
"just" lack widespread adoption, handling of complex user 



preferences is a novel problem for software upgrade, and is 
the main concern of this paper. It boils down to let users 
specify what constitutes the "best" solution among all ac- 
ceptable solutions, and provide mechanisms to efficiently 
find it. Example of preferences are policies [10, 16], like 
minimizing the download size or prioritizing popular pack- 
ages, and also more specific requirements such as blacklisting 
packages maintained by an untrusted maintainer. 

The first necessary step to attack the problem is devising 
a way to encode user preferences in a flexible way, with- 
out hindering package manager ability to respect them. A 
prerequisite of that is a rigorous description of upgrade sce- 
narios, on top of which the meaning of user preferences will 
be defined. 

2. USER PREFERENCE SCENARIOS 

In the following we will consider several possible scenarios 
where user needs can be better encoded as user preferences 
in MooML. The actual encoding in MooML will be pre- 
sented in Section 4.2, after a more in depth presentation of 
the language. 

Size Minimizing the total size consumed by the package in- 
stallation is a rather most basic optimization criterion 
and a frequent need of package managers for embedded 
systems. 

Freshness Preferring more recent package versions over 
older package versions is also very common, and hard- 
wired in most package managers. The hard-wiring in 
Debian's APT as a hard constraint is the main cause 
for the incompleteness of its dependency solving abili- 
ties. 

Pinning To avoid forcing the choice of the most recent ver- 
sion of a package in all cases, APT enables to specify 
different choices for specific packages by the mean of a 
mechanism called pinning [11]. In its essence, pinning 
consists in specifying integer score values (called pri- 
orities) for individual packages based on patterns of 
package names, package versions, and origin; among 
all the versions of a given package, the one with the 
highest priority gets chosen. By default, priority fol- 
lows versions. This is an example of "local" preferences 
that apply to particular packages, in contrast to uni- 
form constraint like total installation size. 

Security updates usually should have highest priority 
while choosing which packages have to be upgraded. 
We will demonstrate MooML's multi-criteria capabil- 
ities by stating that maximizing security updates has 
priority over package freshness. 

Multiple packages Some package managers, most notably 
rpm, allow for multiple versions of the same package to 
be installed; while this is an interesting property, one 
might want to automatically "clean up" useless multi- 
ple installations. This scenario will show how to mini- 
mize the number of packages that are installed in mul- 
tiple versions. 

Note that, while they are presented as such for the sake of 
brevity, scenarios are not mutually exclusive in practice. In 
our vision, some optimization criteria will constitute a de- 
fault configuration of a given package manager (e.g.: always 



prioritizing security upgrades, avoiding package downgrades, 
etc.) while some other will be added by users by the mean 
of specific user interfaces. Even when the latter possibility is 
not exploited, there are advantages in externalizing prefer- 
ences which are currently hard-wired in solving algorithms: 
for instance they will become overwritable by users and it 
will be easier to share optimizers among distributions. 

MooML allows to combine multiple optimization criteria, 
however one has to specify a hierarchy among the multiple 
critera. For instance one can require to search for a solution 
that is minimal in size first, and among all that solutions 
that are minimal in size to choose one with maximal fresh- 
ness. It is not possible to optimize two independent criteria 
at the same time since in that case an optimal solution might 
not exist. 

As next section will explain, optimization criteria do not 
allow to taint the correctness of a solution, e.g. by allowing 
to install at the same time two conflicting packages. 

3. DESCRIBING UPGRADE SCENARIOS 

State of the art mechanisms for specifying user prefer- 
ences highlighted so far [10, 17, 11] suffer from two main 
drawbacks: they are package manager specific, and they are 
not expressive enough to encode all our user preference sce- 
narios (see Section 2). The first step we pursue in addressing 
these shortcoming is devising a rigorous format in which up- 
grade scenarios can be encoded; a user preference language 
will be then developed on top of such a format (see Section 
Section 4). The format is called CUDF (Common Upgrade- 
ability Description Format). The specification of CUDF [14] 
had been guided by some general design principles: 

Be distribution agnostic One of the main purposes of 
CUDF is being a common format to encode upgrade 
scenarios coming from heterogeneous environments. 
As a consequence, CUDF is agnostic to distribution 
specific details such as the used package system or 
package manager. 

Stay close to the original problem While there are sev- 
eral possible encoding of upgrade scenarios [9], CUDF 
aims to be as close as possible to the original prob- 
lem, in order to preserve the ability for humans to 
understand the pre-CUDF upgrade scenario, and ease 
interoperability with legacy package managers. 

Extensibility Core package properties — e.g.: name, ver- 
sion, dependencies, . . . — are shared by all distributions 
and essential to grasp the meaning of upgrade sce- 
narios. Other auxiliary properties are not, but might 
be the subject of user preferences (e.g., minimize the 
number of "buggy" packages, according to distribution 
specific buggyness notions). In order not to hinder 
the possibility to express such user preferences on top 
of CUDF, the format allows to specify extra package 
properties not prescribed by the format specifications. 

Transactional semantics The point of view of CUDF is 
upgrade planning: the notion of correctness of a solu- 
tion with respect to an upgrade scenario expressed in 
CUDF is global and does not express the package de- 
ployment steps needed to pass from the starting pack- 
age status to the final one. Such steps are more low- 
level, and mostly uninteresting for user preferences. 



Plain text format Technically, CUDF aims at being sim- 
ple to parse and to generate. The reason is the con- 
sciousness of the generality of the user preference prob- 
lem and the desire to make the format popular among 
different distributions. As plain text is the universal 
encoding for information interchange formats in FOSS 
communities [12], using a plain text format makes it 
easy for package manager developers to adapt tools to 
CUDF. 

3.1 CUDF Syntax 

The CUDF encoding of an upgrade scenario assumes the 
name of CUDF document. Every such document has an ab- 
stract logical structure, a formal meaning, and a serialized 
form as a plain text file. The logical structure of a CUDF 
document — sketched in Figure 2 — is based on stanzas, which 
are collections of key-value pairs called properties. Values 
are typed within a simple type system containing basic data 
types (e.g.: integers, boolean, and strings) and more com- 
plex, package-specific, data types such as boolean formulae 
over versioned packages used to represent inter-package de- 
pendencies. 



preamble {optional) 
package description! 



package description 



package description 
request description 



Figure 2: Overall structure of a CUDF document. 



CUDF documents contain one package description stanza 
for each package known to the package manager; collectively 
they represent the package universe. This means that both 
installed and non-installed (but available) packages are rep- 
resented in the same way in the same document, in con- 
trast to current package installation systems which often 
distribute this information over different files using differ- 
ent syntactic representations. 

Package description stanzas are based on a core set of 
properties (sometime optional, but always with default val- 
ues), the most important of which are: package and version 
(which unambiguously identify packages), depends and con- 
flicts (which express package dependencies and conflicts 
to be properly installed), provides (which expresses ver- 
sioned features that the current package provides for other 
packages to depend or conflict upon), and installed (which 
state whether the current package is installed or not). 

Figure 3 shows the serialization of a sample CUDF docu- 
ment. As stanzas are separated by blank lines, the central 
part of the figure shows three package description stanzas, 
starting with the package property, where both core and 
extra properties are used. The latter must be declared in 
the optional preamble stanza, which starts the document in 
Figure 3. The ability to declare extra properties accounts 
for extensibility and also enables to statically verify the syn- 
tactic correctness of CUDF documents. The bottom part 
of Figure 3 shows the request description stanza, where the 
user request is expressed. In its minimal form, such stanza 
is used to express which packages the user wants to install, 



preamble : 

property: suite: enum ( stable , unstable ) = \ 

"stable " 
property: bugs: int = 

package: car 
version : 1 

depends: engine, wheel, door, battery 
installed : true 
bugs: 183 

package: bicycle 
version : 7 
suite : unstable 

package: gasoline -engine 
version : 1 
depends : turbo 
provides : engine 

conflicts: engine, gasol ine - engine 
installed : true 



request : 

install: bicycle, gasol ine - engine 
upgrade: door, wheel > 2 



Figure 3: Sample CUDF document. 



remove, or upgrade (using the homonym properties), possi- 
bly specifying version requirements. 

The example lacks the encoding of user preferences. This 
lack, which was our initial motivation for the work reported 
here, can be filled by an optional property specifiable in the 
request stanza, called preferences. Its content is a MooML 
program, discussed in the next section. What is relevant 
here is that MooML programs may be part of CUDF doc- 
uments and will be able to express preferences referencing 
CUDF stanzas. 

3.2 CUDF Semantics 

Given that a CUDF document completely describes an 
upgrade scenario, what does constitute its meaning! Intu- 
itively, an upgrade scenario poses a challenge for the pack- 
age manager, its solutions are new package statuses. The 
meaning, or semantics, of a CUDF document is hence a 
characterization of all valid solutions matching the upgrade 
scenario. We recall that a package status is just a set of 
packages contained in the package universe which we know 
is fully encoded in the document. On that basis, we declare 
that a solution is valid if and only if: 

1. all installed packages have their dependencies satisfied, 
i.e. installed as well {abundance); 

2. no two packages that are in conflict are installed to- 
gether {peace); 

3. the user request is satisfied by installed packages {cor- 
rectness). 

The first two points have been previously formalized rely- 
ing on an encoding in propositional logics [9]. That encod- 
ing fails to respect the design principle of staying close to 
the original problem since, for example, packages with the 
same name and different versions are treated as unrelated 



boolean variables in the encoding. The formal semantics 
of CUDF characterizes all valid solution corresponding to a 
given CUDF document as a binary relation among package 
statuses, indexed by the user request. We will not give the 
full details here, for which the reader is referred to [14], but 
rather only discuss the peculiarities of CUDF formal seman- 
tics and its differences with respect to previous encodings. 

An important semantic difference between existing pack- 
age management systems in FOSS distributions is whether 
they a priori allow packages to be installed in multiple ver- 
sions (like rpm does) or not (like dpkg). CUDF semantics 
here follows the rpm philosophy of a priori allowing multiple 
versions of a package to be installed at the same time. To 
encode Debian-like upgrade scenarios, where different ver- 
sions of the same package are forcibly in conflict, a special 
case of conflicts semantics is exploited, namely: self-conflicts 
are ignored. Hence, in Figure 3, all packages (potentially) 
appearing in multiple versions declare an (unversioned) con- 
flict with themselves, as it happens for gasoline-engine. 
The semantics ensures that such conflicts are ignored for 
the very same version of the package (otherwise those pack- 
ages will be useless) but take effect on different versions of 
gasoline-engine, granting that only one version of it can 
be installed. Such a semantics is coherent with self-conflicts 
on virtual packages, which can be exploited to ensure mu- 
tual exclusions among different providers of the same fea- 
ture. For instance, three packages like postfix, sendmail, 
and qmail, all providing the mail-transport-agent feature, 
can be made mutually exclusive by having all of them both 
provide and conflict with mail-transport-agent. 

Finally, feature provision via provides is versioned, mean- 
ing that specific versions of a given feature can be provided. 
Not specifying a version — as in provides: foo — is inter- 
preted as providing all versions of the foo feature. 

Equipped with all this, verifying the satisfaction of a user 
request boils down to re-use the notions of peace and abun- 
dance: an install request is satisfied if and only if the same 
line, considered as a dependency, would be satisfied (abun- 
dance); a remove request is satisfied if and only if a corre- 
sponding conflict is unsatisfied (peace). Only upgrade needs 
some caution: in principle it can be handled as install, but 
additionally it also requires that all packages mentioned in 
the user request are installed in a single version. Further- 
more, after upgrade-ing some package we must have a ver- 
sion of that package that is at least as new as any previously 
installed version of that package. 

CUDF also allows to express that a particular package 
must not be removed, that it must be kept in its current 
version, or that its functionality must be provided by some 
package (see [14] for details). 

3.3 CUDF Implementations 

CUDF has already seen various implementations. The 
first implementation — libcudf — is the "reference" implemen- 
tation of the CUDF specifications and has been developed 
by one of this paper authors, libcudf consists in a library 
able not only to parse and pretty print CUDF documents, 
but also to verify the CUDF semantics. This latter feature 
can be exploited in two ways: 

1. given a CUDF document, libcudf can verify whether 
the contained package status is consistent, i.e., whether 
abundance and peace are verified for all its packages; 



2. given a CUDF document and an encoding of a poten- 
tial solution, libcudf can verify whether the solution 
is valid, i.e., abundance + peace + user-request satis- 
faction. 

libcudf comes with the cudf -check command line tool 
which provides the above two features out of the box. The li- 
brary is Free Software and can be user both from the OCaml 
and C programming languages; it is available for download 
at http: //www .mancoosi . org/ software/. 

The authors are aware of other CUDF implementations. 
Some of them are being developed within MANCOOSI to con- 
vert distribution-specific upgrade scenario descriptions into 
CUDF, so that a cross-distribution corpus of upgrade sce- 
narios can be formed. They will be released shortly at least 
for the following distributions: Mandriva, CaixaMagica, De- 
bian GNU/Linux. Using such tools we have verified that the 
average size of an upgrade scenario encoded in CUDF is lin- 
ear with the size of the origin package manager information 
and usually smaller." 

Another independent implementation is already available 
in CUPT , a new APT-compatible package manager for De- 
bian. In CUPT, CUDF is used as a syntactic format to pipe 
upgrade scenarios to external solvers, so that upgrade plan- 
ning can be decoupled from other package manager activ- 
ities. Also, such a choice enables sharing more easily de- 
pendency solvers not only inter- and intra-distributions, but 
also with the scientific community. 

4. EXPRESSING USER PREFERENCES 

Having a rigorous description of upgrade scenarios, we 
can now devise our language to express user preferences. 
Our proposal for such a language — MooML for MancOosi 
Optimization Meta 4 -Language — is described in this section. 
The design of the language needs to face two requirements 
that appear to be in mutual conflict: 

Simplicity programs written in MooML have to be inter- 
preted by solver tools that will try to satisfy user pref- 
erences. Hence they should be as simple as possible in 
order to minimize the burden put on the developers of 
these tools. 

Expressivity the MooML language should allow to ex- 
press sophisticated optimization criteria expressive 
enough to encode the scenario we have discussed. 

The right choice of a language was to be found between 
two extremes. On one extreme a Turing-complete program- 
ming language with rich user-defined data structures and 
function definitions through unrestricted recursion. This 
extreme would provide maximum expressivity by definition, 
but would require tool developers to integrate an interpreter 
for a full-fledged programming language. On the other ex- 
treme a restricted language allowing only for simple combi- 
nations of optimization criteria for which a limited choice 
of common simple criteria is provided. This extreme would 

2 e.g. on a large Debian installation, using both testing and 
unstable package repositories for about 45'000 packages, the 
package manager information on disk amounts to 14 Mb and 
the corresponding CUDF document has 9 Mb. 
3 http: //wiki .debian . org/Cupt 

4 The meta is inherited from the ML family of languages, for 
our purpose there is no distinguished meta level. 



( let x = e )* 

( constraint e )? 

( (minimize | maximize) 



program 

definition 
constraint 
criteria 



Figure 4: Syntax of MooML programs 



probably make life easy for tool implementers, but would be 
too limited in expressivity. It would also bear the risk of be- 
ing obliged to continuously extend the choice of optimization 
criteria. 

In order to find the right balance between these two ex- 
tremes we made the following design choices for MooML: 

• MooML allows to separately specify hard constraints 
that must be satisfied by "user-approved" solution, and 
optimization criteria. 

• MooML does not allow to program directly an algo- 
rithm that compares alternative solutions'. Instead, 
the language allows to define how to compute a mea- 
sure of solution quality. Two possible solutions are 
compared by comparing their respective measures. A 
MooML program specifies the polarity of each mea- 
sure (i.e., whether it should be minimised or max- 
imised). In case several measures are defined the pro- 
gram defines a strict priority hierarchy (technically this 
is a lexicographic combination of orders). 

• MooML is a strongly typed functional language allow- 
ing for polymorphic types and inference of principal 
types. 

• MooML does not allow for arbitrary use of recursion, 
and is deliberately not Turing complete. Instead it pro- 
vides for a generic fold-like iterator over lists, which 
allows to program primitive recursive functions over 
lists. 

• MooML does not allow to define custom data types. 

• MooML does not have a mechanism to catch excep- 
tions but allows to express execution errors. 

4.1 MooML Programs 

The high-level syntax and structure of a MooML program 
is sketched in Figure 4. Such a program is composed by a 
series of preparatory global definitions, meant to be reused 
in the remainder of the program. Then, two main parts 
compose a MooML program. The first is a constraint, that 
is a boolean expression which, when evaluated to true, indi- 
cates a solution considered acceptable by the user. Using a 
constraint users can exclude solutions that, in spite of being 
valid with respect to CUDF semantics, are undesirable for 
them. The second part is a list of optimization criteria, i.e. 
expressions of the language returning integers and tagged 
with a request to either minimize or maximize them over all 
otherwise valid solutions. 

The syntax of MooML expressions, as given in Figure 5, 
has features borrowed from common functional program- 
ming languages. Expressions sport rich types such as records, 
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expressions 


X 




variable 


Cv 




literal 


fun x -> e 




abstraction 


e e 




application 







unit 


(ei, . . . , e n ) 




tuple 


{li = ei, . . . , l n = 


e n } 


record 


[] 




empty list 


e :: e 




list 


e.l 




projection 


let p = ei in e2 




let binding 


match e with 
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=> e n 








patterns 


x 




variable 


c„ 




constant 







unit 


(pi, ... , p n ) 




LLLjJLC 


{ll = pi, . . . , l n = 


Pn} 


record 


/ 

X 




enumeration 


[] 




empty list 


pi :: P2 




list 


* 




wildcard 






CUDF literals 


true | false 




booleans 


... | -1 | | 1 | . 




integers 


"s" 




strings 


'1 




enumerations 






formulae, . . . 



5 as it happens, for example, with the sort function provided 
by the standard library of several programming languages 



Figure 5: Syntax of MooML expressions 



tuples and lists defined on top of the basic CUDF types, 
as well as expressive constructs such as pattern matching 
and (non-recursive) local definitions. The evaluation of a 
MooML program is a straightforward ML-style evaluation [4] 
with pattern matching [2]; overall it boils down to evalu- 
ate the constraint and optimization criteria expressions in 
an evaluation environment enriched with global definitions. 
Additionally, the environment is also enriched with: 

• the MooML standard library, which provides the usual 
kit of functional programming functions and in partic- 
ular the fold iterator (and some of its derivatives, like 
map, and filter) without which iterating over list data 
structures would be impossible within the language; 

• the package universe u denotes a list of records rep- 
resenting all the package description stanzas of the 
CUDF document from which the MooML program 
originated. Each record contains one field for each 
package property, and can therefore be properly typed 
having around the CUDF preamble. However, the 
installed property gets split into two new properties: 

was-installed (the same as the original installed, 
renamed for clarity) denotes whether the owning 
package was installed in the upgrade scenario pre- 
sented to the package manager 

is-installed denotes whether the owning package is 



installed in the proposed solution in the context 
of which the MooML program is being evaluated 

• the user request r denotes a record corresponding to 
the user request stanza of CUDF. 

The only way to express iteration over lists is to use the 
predefined fold function. An expression 

(fold f [an ; ... ; al] aO ) 

is evaluated as 

(f an (f a(n-l) ... (f al aO) ... )) 

An alternative way to describe its semantics is the following 
iterative pseudo-code: 

r := aO; 

foreach i in 1 . . n do r := (f ai r) ; 
return r; 

For instance, the standard library contains a definition of 
the sum function to sum up a list of integers: 

let sum 1 = fold add 1 

and other functions like filter, map, max, etc. acting on lists 
can easily be defined the same way. 

Note that all properties, except the is-installed prop- 
erty, of packages are given by the CUDF document on which 
a MooML program is applied. The input CUDF document 
describes the was-installed property of any package, it is 
the role of the MooML program to impose constraints on 
the possible is-installed properties of packages, and to 
calculate a score on any possible choice of is-installed 
properties of the packages. 

Once the constraint and criteria expressions are fully re- 
duced, there are enough information to know whether or 
not the solution should be discarded (constraint evaluated 
to false). If it is not the case, the different criteria values 
together denote a tuple that can be lexicographically com- 
pared with tuples coming from other candidate solutions to 
determine which of the two is to be preferred. Of course the 
lexicographic order should take into account the "polarity" 
of the criterion, i.e., whether it was a minimize or maximize 
request. 

Types are not explicitly given in the syntax of the lan- 
guage, because they can be reconstructed in the style of 
Damas-Milner [5] , obtaining principal types. The only source 
of ambiguity in the type system are record labels which, due 
to the CUDF ability to declare extra properties, may be not 
sufficient to unambiguously determine record types. While 
there seems to be no obstacles in extending the type system 
to account for them in the style of Remy [13], we have pre- 
ferred to provide optional type ascriptions in the concrete 
syntax (not shown in Figure 5) to disambiguate the rare 
ambiguous cases. 

4.2 Examples 

MooML is expressive enough to account for all usage sce- 
narios presented in Section 2, as we will show in the follow- 
ing. The need of program simplicity for solver implementers 
will be addressed by the partial evaluation mechanism in the 
next section. 

Example 1 (minimize total installation size). 
The "Hello, World!" equivalent in MooML is likely to be the 



widespread policy of minimizing the total installation size, 
very useful for embedded or otherwise constrained systems. 
It can be expressed as: 

let size pi = 

sum (map (fun p -> p . installed - s ize ) ) pi 
minimize size 

(filter (fun p -> p . is - inst ailed ) u) 

where sum is a library function summing up integers. The 
program simply states that the score to be minimized is the 
sum of the installed-size value (an extra property with the 
obvious meaning) of all packages installed in the proposed 
solution. 

Example 2 (maximize package "freshness"). 
The scenario requiring to maximize the number of packages 
installed at their most recent version can be expressed as 
follows. 

let is-recent p = 
f orall 

(fun q -> (q.name != p. name) 

I I (q. version <= p. version) 

u 

maximize cardinality 

(fun p -> p . is - inst ailed kk is-recent p) u 

is-recent is used as an auxiliary function to check whether 
a given package — given as its record — is the most recent ver- 
sion of all equally named packages: its implementation relies 
on f orall which check the true-ness of a boolean predicate 
over a list of items (in this case, the package universe u). 
Complementary, cardinality counts the number of times 
a predicate is true over a list; in the given optimization 
criteria, it is used to require the maximization of "recent" 
packages. 

Example 3 (flexible APT pinning). 
APT pinning can be encoded in at least a couple of dif- 
ferent ways using MooML, depending on the desired goal. 
A first possibility is to encode the exact semantics of pin- 
ning, so that the only acceptable solutions will be those poten- 
tially returned by a pinning implementation. In essence, pin- 
ning works at the package choice level, ensuring that among 
all available versions of a given package, the one with the 
highest pin priority is installed. While pin priority them- 
selves can be assigned using MooML (see Section 5), if 
we assume that each package comes with an extra property 
pin-priority, we can encode pinning semantics as follows: 

let max-pin p = 

max (map (fun z -> z . pin - pr i or it y ) 

(filter (fun q -> q.name == p. name) u)) 
constraint forall (fun p -> p . is - installed 
kk p . pin - pr i or it y = max-pin p) 

Given that this strict semantics is a well-known cause of 
APT incompleteness [8], a more "flexible" pinning encoded 
can be obtained by requiring to maximize the number of pack- 
ages at maximal pin priority: 

let max-pin p = (* as above *) 
maximize cardinality 

(fun p -> p . is - inst ailed 

kk p . pin - pr i or it y = max-pin p) 

An even more flexible metric over APT pinning can be ob- 
tained by minimizing the total difference between maximal 
and actual pin priorities as follows: 



let max-pin p = (* as above *) 
minimize sum 

(map (fun p -> if p . is - installed 

then max-pin p - p . pin -prior ity 
else 0) u) 



Example 4 (priority to security updates). 
The scenario which requires to prioritize security upgrades 
over any other criteria can be encoded straightforwardly by 
relying on MooML 's lexicographic ordering over solution 
measures. In the following example it is combined with the 
freshness criteria of Example 2. 

let is-recent p = (* as above *) 
maximize cardinality 

(fun p -> p . is - installed kk not p. 

kk p . is - security - f ix ) u 
maximize cardinality 

(fun p -> p . is - inst ailed kk is-recent p) u 



, was - installed 



Note that we explicitly require the package to be newly in- 
stalled before verifying whether it is a security fix ( extra prop- 
erty is-security-fix), this way we ensure the security fix 
is being delivered with the proposed solution. Lexicographic 
ordering ensures that solutions with a higher number of se- 
curity fixes being delivered will be preferred, no matter the 
total freshness. (How to improve the example to ensure that 
no past security fixes get removed by downgrades is left as 
an exercise.) 

Example 5 (minimize multiple versions). 
In this example we wish to minimize the number of packages 
that exist in multiple versions in the final installation. 

let number -versions p = length 

(filter (fun q -> q . is - installed kk 
p . name = q.name) 

u) 

minimize cardinality 

(fun p -> p . is - inst ailed kk 

number - vers i ons p > 1) u 

The function number-versions applied to package p calcu- 
lates the number of installed packages with the same name as 
the package p. We minimize the number of installed pack- 
ages for which the function installed-version returns a 
value strictly greater than 1. 

5. PARTIAL EVALUATION 

In their full generality, MooML programs can be too com- 
plex to handle for dependency solvers, or at least require non 
trivial implementation efforts to develop a full MooML eval- 
uator. To address this shortcoming, MooML has been de- 
signed to be a good subject for partial evaluation which pro- 
cesses fully general MooML programs and returns "simpler" 
programs, ideally more suitable for digestion by dependency 
solvers. More precisely (see Figure 6), MooML partial eval- 
uation is applied to a program p, which belongs to a CUDF 
document c, and returns two new entities: a new program p 
and a CUDF transformer applicable to "c-like" CUDF doc- 
uments, intuitively documents sharing the same extra prop- 
erties of c. Once applied to c, the transformer returns a new 
document c' , to which p' belongs. Partial evaluation enjoys 
the property that the evaluation of p in the context of c re- 
turns the same result (constraint and measure tuple) than 
the evaluation of p' in the context of c'. The advantage of 
p over p is that it is potentially simpler, in the sense that 




Figure 6: Partial evaluation and its properties 



it can be implemented by ignoring significant parts of the 
MooML language. However, in the worst case, the partial 
evaluator may not be able to do any simplification. 

The guiding principle of MooML's partial evaluation is to 
pre-compute all sub-expressions that depend on the upgrade 
scenario, but not on the upgrade solution, and to "save" them 
as fresh package properties. As a consequence, p' is ob- 
tained by substituting complex sub-expressions with access 
to (fresh) properties, and c' is obtained by adding (fresh) 
properties. To characterize a little more formally, the sub- 
expressions that are good partial evaluation candidates we 
first define an equality relation which relates all package sta- 
tuses equal up to is-installed: 

Definition 1 (sibling package lists). Two package 
lists h,h are siblings, writtenh =c= h, if Dom(h) — Dom(h) 
(i.e., they contain the same packages), and for each (p,v) G 
Dom(h) we have that h(p,v) equals h(p,v) except possibly 
the value of the is-installed property. 

Then, we grasp partial-evaluable (sub-) expressions with the 
notion of local expressions. In the following definition we 
will make use of a mathematical semantics of the MooML 
language (the formal definition of which is omitted from this 
paper). When e is a MooML expression and a an evaluation 
environment mapping identifiers to semantic values, then 
[[e]]cr denotes the semantic object obtained by evaluating e 
in the environment a. 

Definition 2 (local expressions). An expression e 
of type package — > t, and which does not have any unbound 
identifiers besides r and u, is called local if for all packages 
p, for all package lists l\, I2, request ro such that l± 0= I2 and 
p £ li, p 6 I2 we have that 



HI 



h, 



r ](p) = [[e]][u ^ l 2 , 



ro](p) 



Intuitively, local expressions are all those expressions whose 
evaluation does not depend on the is-installed values of 
packages coming from the package universe; note that ex- 
pressions accessing the is-installed property of their ar- 
gument can be local nevertheless. As stated in Definition 2, 
the expression e must not refer to any previously defined 



function, but this is not really a restriction as we can always 
inline all function definitions (since the language does not 
allow for recursive definitions). 

We extend the MooML type system in order to deter- 
mine a set of expressions that are local in the sense of 
Definition 2. The extension is straightforward and in the 
style of Volpano [18]. The record type gets split into safe 
and unsafe records, with type instantiation that enables to 
"cast-down" safe to unsafe; complementary, record projec- 
tion typing gets changed to type as unsafe record projec- 
tions explicitly accessing the is-installed property. The 
intuition is that functional expressions having principal type 
with safe record argument are guaranteed not to access its 
is-installed property. 

Equipped with the above typing machinery, each MooML 
sub-expression — no matter where it appears — that have type 
package — > t, for some t, can be tested for locality as fol- 
lows: 

1. if it can be typed under the premise that u is a list of 
safe packages, then the expression is local 

(a) if, moreover, its principal type is an arrow from 
safe packages to something, the expression is fully 
determined without the candidate solution 

(b) otherwise, the expression depends on the property 
is-installed of its sole argument 

2. otherwise the expression is not local 

Case (la) is the luckiest: the sub-expression can be pre- 
computed on all packages of the universe, its value stored 
in a fresh property name (to be declared in the preamble), 
and replaced by a field access the fresh property. Case (lb) 
requires the additional efforts of (statically) computing two 
possible values of the sub-expression, according to the possi- 
ble values of is-installed, and of tweaking the program to 
lookup one or another fresh property according to the actual 
is-installed value at runtime. Case (2) is the worst case, 
where no partial evaluation is possible due to non locality. 

Example 6. To demonstrate partial evaluation in prac- 
tice we reconsider Example 2. It contains two expressions 
having types package — > t for some t. The first one, sub- 
expression of the is-recent definition body, belong to case 
(la) (unrelated to solution), while the second one needs the 
property is-installed of its argument, still being local. Par- 
tial evaluation will rewrite the MooML program leading to 
something like: 

let is-recent p = forall (fun q -> q.freshO) u 
maximize cardinality 

(fun p -> if p . is - installed 

then q . f reshl 

else q . f r e sh2 ) 

where freshO, freshl, fresh2 are fresh properties defined 
as follows. freshO will be true for all "most recent pack- 
ages", freshl will inherit from freshO, and fresh2 will be 
the constant false. 

The limits of the partial evaluation approach are demon- 
strated by the example of minimizing multiple installed ver- 
sions of packages (Example 5). In that case the partial evalu- 
ator does not bring any advantage since everything depends 
on the final installation status of the packages, and there is 



no additional information that can be pre-computed inde- 
pendently of the installation status of the other packages in 
the universe. A similar case is the maximization of the num- 
ber of installed packages that have their recommends (which 
is a weak, non-mandatory form of package dependency) sat- 
isfied. 

A concluding noteworthy scenario is a reprisal of APT 
pinning handling (see Example 3). No matter how "strictly" 
pinning gets implemented in MooML, partial evaluation 
enables to relax the requirement that pin priorities reach 
MooML pre-computed, without neither implementation bur- 
den, nor performance loss for dependency solvers. The idea 
is to store in MooML the rules to assign pin priorities to 
packages on the usual basis (origin suite, package name, 
package version, . . . ) relying on apposite extra properties 
and suitable standard library functions (like regular expres- 
sion matching). If pinning assignment is encoded as func- 
tions from packages to integers (and hardly will be other- 
wise), there is no reason for the implementing expression to 
access the is-installed property, given that pinning rules 
are static. Hence, the resulting sub-expressions are local — 
case (la) — and will be completely removed during partial 
evaluation, returning a CUDF document such as those as- 
sumed by Example 3. 



6. CONCLUSION 

The request to alter the installation of component based 
software collections as large as FOSS distributions can have 
a daunting number of satisfying answers. To choose the 
"best" solution among them, state of the art package man- 
agers implement ad-hoc heuristics and offer preference mech- 
anisms of limited expressiveness. In this paper we presented 
an architecture to specify user preferences about FOSS pack- 
ages which is both independent from specific package man- 
agers or distributions and expressive enough to encode sev- 
eral preference scenarios. The architecture is composed by a 
format to encode upgrade scenarios (CUDF) and by a func- 
tional language to encode user preferences (MooML). 

Future work is planned on several directions. First of all, 
while syntax and formal semantics of CUDF have been stud- 
ied already, various properties of MooML still need to be 
investigated in more detail. In particular we plan to charac- 
terize various subsets of MooML that correspond, after par- 
tial evaluation, to language fragments which are best suited 
for different encodings of package upgrade problems (SAT, 
PBO, constraint programming, etc.). 

We also plan to carry partial evaluation further in the 
direction of getting rid of data types at partial evaluation 
stage, so that only integers (which are the preferred data 
type for the optimization community) remain after that. 

Finally, the mentioned corpus of upgrade scenarios coming 
from different distributions is actually being collected with 
the final goal of organizing a recurrent dependency solving 
competition. Ideally, such a that forum will become a venue 
where the package manager developer community meets the 
research community on constraint solving. Both communi- 
ties could profit from this: package managers can use com- 
plete and more powerful dependency solving tools, and it 
gives the research community access to a large corpus of 
real-life optimization problems of non-trivial size. 
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